Duration modeling techniques for continuous speech recognition

نویسندگان

Janne Pylkkönen

Mikko Kurimo

چکیده

Phone durations play a significant part in the comprehension of speech. The duration information is still mostly disregarded in automatic speech recognizers due to the use of hidden Markov models (HMMs) which are deficient in modeling phone durations properly. Previous results have shown that using different approaches for explicit duration modeling have improved the isolated word recognition in English. However, a unified comparison between the methods has not been reported. In this paper three techniques for explicit duration modeling are compared and evaluated in a large vocabulary continuous speech recognition task. The target language was Finnish, in which phone durations are especially important for proper understanding. The results show that the choice of the duration modeling technique depends on the speed requirements of the recognizer. The best technique required a slightly longer running time than without an explicit duration model, but achieved an 8% relative improvement to the letter error rate.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Context-dependent modeling is a widely used technique for better phone modeling in continuous speech recognition. While different types of context-dependent models have been used, triphones have been known as the most effective ones. In this paper, a Maximum a Posteriori (MAP) estimation approach has been used to estimate the parameters of the untied triphone model set used in data-driven clust...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A variety of methods are used for speaker adaptation in speech recognition. In some techniques, such as MAP estimation, only the models with available training data are updated. Hence, large amounts of training data are required in order to have significant recognition improvements. In some others, such as MLLR, where several general transformations are applied to model clusters, the results ar...

متن کامل

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

متن کامل

A Parallel Implementation of a Hidden Markov Modelwith Duration Modeling for Speech Recognition yCarl

Hidden Markov models (HMMs) are currently the most successful paradigm for speech recognition. Although explicit duration continuous HMMs more accurately model speech than HMMs with implicit duration modeling, the cost of accurate duration modeling is often considered prohibitive. This paper describes a parallel implementation of an HMM with explicit duration modeling for spoken language recogn...

متن کامل

On use of duration modeling for continuous digits speech recognition

In this paper, we describe our duration model techniques in HMM based speech recognizer. With this approach, a large amount of deletion and insertion errors can be reduced in Mandarin continuous digits recognizer. We address a simple duration penalty function, which can be explicitly combined into Viterbi-Beam search with negligible incremental computation overload. Different parametric distrib...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Duration modeling techniques for continuous speech recognition

نویسندگان

چکیده

منابع مشابه

Improved Bayesian Training for Context-Dependent Modeling in Continuous Persian Speech Recognition

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

Speaker Adaptation in Continuous Speech Recognition Using MLLR-Based MAP Estimation

A Parallel Implementation of a Hidden Markov Modelwith Duration Modeling for Speech Recognition yCarl

On use of duration modeling for continuous digits speech recognition

عنوان ژورنال:

اشتراک گذاری